Data Management and Visualisation Project: Forest Research

The Growth of Woodland Area within the United Kingdom (1998-2024)

Data Source

This project is focused on visualising various changes in woodland within the United Kingdom between March 31st of 1998 and 2024 respectively. The raw data and logo for this project were retrieved from ‘Forest Research’, a branch of the Forestry Commission (FC) which collects data by totaling extractions made from multiple forest services operating within the United Kingdom. It principally gathers information on tree-related statistics, including but not limited to gathering longitudinal provisional data on the estimated amount of woodland area in the UK.

Alongside data on privately owned woodland, provisional estimates have been retrieved from land within the public sector - including Forest England (FE), Forest and Land Scotland (FLS), Natural Resources Wales (NRW) and Forest Service (FS) (for Northern Ireland).

The “Woodland area, UK, 1998 to 2024” ODS file contains 5 sheets of actual data, though was supplemented with cover, content and notes pages. In tandem, the pages containing data were annotated with further notes clarifying the origin of the data and different commissions involved in gathering it. The pages containing data split woodland area based on whether it was coniferous or deciduous, private or public sector, and also summated total estimated woodland per country and in the United Kingdom as a whole.

Full Data Reference

Forest Research (2024, March 31). Tools and Resources: Data Downloads. https://www.forestresearch.gov.uk/tools-and-resources/statistics/data-downloads/


Glimpse of the Raw Data …

## Should the below list of packages need to be installed, remove '#'(s) and run:
#libraries<-c("tidyverse", "cowplot", "magick", "readODS", "here", "plotly", "highcharter", "htmlwidgets")
#install.packages(libraries, repos="http://cran.rstudio.com")

## NECESSARY PACKAGES:

library(here)         # to create user-specific filepaths
library(readODS)      # to retrieve data from ODS files
library(tidyverse)    # for various visualisation packages
library(cowplot)      # for various visualisation options
library(magick)       # to import images onto visualisations
library(plotly)       # to create interactive scatterplots
library(highcharter)  # to create interactive barplots
library(htmlwidgets)  # to save interactive visualisations

## DATA PREPARATION:
# Extraction of raw data from ODS file

country_names<-c("England","Wales","Scotland","Northern Ireland")
pathway<-paste0(here("raw_secondary_data", "area-timeseries-20jun24.ods"))
countries_list<-list()
for (i in seq_along(country_names)){
  countries<-country_names[i]
  countries_list[[countries]]<-read_ods(pathway, sheet = i+3)
} # this loop extracts all of the raw data from the country-specific sheets and imports it into R
raw_extracted_data<-data.frame(countries_list) # which is then put into a dataframe

Processed Data …

# Processing and cleaning of data

processed_extracted_data<-raw_extracted_data[-c(1:3),] # removes unnecessary text
names(processed_extracted_data)<-as.matrix(processed_extracted_data[1,]) # labels each column by their original titles
processed_extracted_data<-processed_extracted_data[-1,] # removes text names from the data
names(processed_extracted_data)<-make.unique(names(processed_extracted_data)) # makes each column unique
processed_extracted_data<-processed_extracted_data %>% select(-starts_with("Note")) # removes erroneous "note" columns
processed_extracted_data<-processed_extracted_data%>% mutate_if(is.character, as.numeric) # converts all variables into numeric form
rownames(processed_extracted_data) <- NULL # corrects for the erroneous row numbers extracted when subsetting from the raw data

Full annotated scripts describing the step-by-step processing of the raw data are available within the appropriately named Forest Research repository on GitHub: https://github.com/JacobMoorcroft/ForestResearch



The Research Question(s)

The main rationale for exploring this data was to elucidate whether and where there has been growth in woodland area across the United Kingdom from 1998 to 2024. This included creating a static visualisation of absolute numeric woodland amounts, annotated with the rate (percentage increase) of development from 1998-2024, as well as more advanced interactive visualisations which showed specific patterns subject to being coniferous or deciduous, and public or private sector woodland.

For the below plot, calculating the percentage increase was important to contextualise the proportional rate of growth for each country, beyond simply interpreting development based on raw numeric increase in hectares of woodland. Following this up with interactive visualisations allowed more precise dissemination of whether patterns of development were distinct depending on tree type (i.e. coniferous or deciduous) and woodland ownership (i.e. public or private sector).



Static Plot (using ggplot2)

The below plot shows with visual clarity that the growth of woodland area within the United Kingdom can be interpreted very differently based on using absolute numeric increase or proportional rates of change. From this, reporting either of these measures outside of the context of the other could be misleading - i.e. interpreting that England or Scotland have made superior woodland development efforts compared to Northern Ireland, in the context that they have gained considerably higher amounts of woodland region, neglects how Northern Ireland has shown considerably greater proportional growth through percentage increase in native woodland from 1998-2024. Regardless, the data and accompanying visual can simply be interpreted on the promising notion that all countries within the United Kingdom have shown woodland growth between 1998 and 2024.

# Mutual code for all visualisations
fig_path<-here("figs") # creates the necessary path for saving the figures
logo_file<-paste0(here("logo","Picture1.jpg")) # creates the path for applying the logo
year_ending_March_31st<-processed_extracted_data$`Year ending 31 March` # timeframe for visualisations

# This first plot, aptly named 'Basic Plot', was created while at the University of Sheffield as part of initial novice attempts at writing code, examined during the PSY6422 Module on the 'Psychological Research Methods with Data Science' MSc Course (2023-24).

## DATA EXTRACTION: 
# Extraction of necessary variables from processed dataframe

country_names<-c("England", "Wales", "Scotland", "Northern Ireland")
woodland_area<-select(processed_extracted_data, ends_with(" total (thousand ha)")) # Total Woodland Area
for (country in country_names){
  assign(country, woodland_area[[paste0(country, " total (thousand ha)")]])
} # creates numeric variables for the woodland area of each country, to be labelled per year

woodland_area<-data.frame(England,Wales,Scotland,`Northern Ireland`) # which is then put back into the dataframe
woodland_area<-woodland_area%>% rename(Northern_Ireland=`Northern.Ireland`) # *corrects for interaction issues now caused by spacing in `Northern Ireland`
country_names<-c("England", "Wales", "Scotland", "Northern_Ireland") # *

woodland_by_year<-cbind(year_ending_March_31st,woodland_area) # Total Woodland Area, By Year

# Calculation of proportional change in woodland area per country
percentage_results<-data.frame(country=character(),percentage_increase=numeric(),stringsAsFactors=FALSE) # creates an empty dataframe
for(country in country_names){
  min_v<-min(woodland_area[[country]])
  max_v<-max(woodland_area[[country]])
  percentage_increase<-(((max_v-min_v)/min_v)*100)
  percentage_results<-rbind(percentage_results, data.frame(country=country,percentage_increase=percentage_increase))
} # this loop calculates the percentage increase in woodland area of each country from 1998-2024
percentage_results[1:4,2]<-round(percentage_results[1:4,2],2) # then rounds the data to 2 decimal places

## FINAL DATAFRAME:
woodland_growth_over_time<-data.frame(
  country=c(rep("England",27),rep("Wales",27),rep("Scotland",27),rep("Northern Ireland",27)),
  woodland=c(woodland_by_year$England,woodland_by_year$Wales,
             woodland_by_year$Scotland,woodland_by_year$Northern_Ireland),
  year=c(woodland_by_year$year_ending_March_31st)) # creates a final dataframe amenable to the upcoming visualisation

mapping<-aes(x=year,y=woodland,colour=country) # creates the mapping for the visualisations

## BASIC STATIC PLOT:
# Creates a plot mapping the amount of woodland area development from 1998-2024, as divisable by country, and as annotated with percentage increase

BasicPlot<-woodland_growth_over_time %>%
  ggplot(mapping=mapping)+
  geom_smooth(method="gam")+
  labs(x="Year (ending March 31st)",
       y="Woodland area (in thousand hectares)",
       colour="Country",
       title="The Growth of Total Woodland Area within the United Kingdom",
       subtitle="As annotated with percentage increase from 1998-2024",
       caption="Data retrieved from: Forest Research, 2024")+
  annotate("label",x=2010,y=1210,label=paste(percentage_results[1,2],"%"),colour="#EE0000",size=3,fontface="bold")+
  annotate("label",x=2010,y=375,label=paste(percentage_results[2,2],"%"),colour="#00CD00",size=3,fontface="bold")+
  annotate("label",x=2010,y=1450,label=paste(percentage_results[3,2],"%"),colour="#0000CD",size=3,fontface="bold")+
  annotate("label",x=2010,y=175,label=paste(percentage_results[4,2],"%"),colour="#FFA500",size=3,fontface="bold")+
  scale_colour_manual(values=c(England="#EE0000",Wales="#00CD00",
                               Scotland="#0000CD",`Northern Ireland`="#FFA500"))+
  scale_x_continuous(breaks=seq(1998,2024,5))+
  scale_y_continuous(breaks=seq(0,1500,150))+
  theme(panel.border=element_rect(colour="#8B7355",fill=NA,linewidth=2),
        panel.grid.minor=element_line(colour="#8B7355",linewidth=0.5),
        panel.grid.major=element_line(colour="#CAFF70",linewidth=0.7),
        panel.background=element_rect(fill="#FFFFF0"),
        axis.line=element_line(linewidth=2,colour="#8B7355"),
        plot.title=element_text(face="bold"),
        plot.subtitle=element_text(face="italic"),
        text=element_text(family="serif"),
        legend.title=element_text(face="bold"),
        legend.box.background=element_rect(colour="#8B7355"),
        legend.box.margin=margin(1,1,1,1),
        legend.key=element_rect(colour="#8B7355"))

# To demonstrate the ability to do so, this code adds the official logo to sit alongside the data source

BasicPlot<-ggdraw(BasicPlot)+
  draw_image(logo_file, scale=.2,x=1,hjust=1,halign=1,valign=0)

## Visualisation of the Growth of Total Woodland Area within the United Kingdom, from 1998 to 2024

BasicPlot


While this plot was initially taken from my first ever Data Science Project completed during my Master’s studies, the following two interactive figures (created using Plotly) are a more recent attempt at configuring and advancing upon my previous abilities with data management and visualisation. Also, now that we’ve already gone through the process of showing a general composition of woodland development from 1998-2024, it could be curious to investigate more nuanced and dissectable patterns in the data relevant to land ownership and tree type.



Interactive Plot(s) (using plotly)

Generally, the below scatterplots indicate that private woodland area has developed at a much more visually noticeable rate than public woodland, even after accounting for the differences in initial proportions by creating graphs with relative scales. Further exploration into why this has happened (i.e. privatisation of forested areas for production purposes, higher funding for development, sawmill locations, etc.) could be interesting should this want to be approached in a more nuanced way than just a visualisation project. Regardless, it was entertaining to play around with the data to show this.

Public Sector Woodland

# DATA EXTRACTION: Woodland Tree Types on PUBLIC land ...

public_trees<-data.frame(
  tree_type=c(rep("FE conifers, England",27),rep("FE broadleaves, England",27),rep("NRW conifers, Wales",27),rep("NRW broadleaves, Wales",27),
              rep("FLS conifers, Scotland",27),rep("FLS broadleaves, Scotland",27),rep("FS conifers, NI",27),rep("FS broadleaves, NI",27)),
  woodland=c(processed_extracted_data$`FE conifers (thousand ha)`,processed_extracted_data$`FE broadleaves (thousand ha)`,
             processed_extracted_data$`NRW conifers (thousand ha)`,processed_extracted_data$`NRW broadleaves (thousand ha)`,
             processed_extracted_data$`FLS conifers (thousand ha)`,processed_extracted_data$`FLS broadleaves (thousand ha)`,
             processed_extracted_data$`FS conifers (thousand ha)`,processed_extracted_data$`FS broadleaves (thousand ha)`),
  year=c(`year_ending_March_31st`)) # creates a dataframe amenable to the upcoming visualisation

public_trees$tree_type<-factor(public_trees$tree_type, levels=c("FE conifers, England", "FE broadleaves, England", 
                                                                  "NRW conifers, Wales", "NRW broadleaves, Wales",
                                                                  "FLS conifers, Scotland", "FLS broadleaves, Scotland",
                                                                  "FS conifers, NI", "FS broadleaves, NI")) # this code facilitates consistent ordering of variables in visualisation legends

mapping<-aes(x=year,y=woodland,colour=tree_type) # creates the mapping for the visualisations

## BASIC INTERACTIVE PLOT 1:

# Generates an interactive scattergraph of documented PUBLIC coniferious and deciduous woodland area within the United Kingdom from 1998-2024

InteractivePlot_public<-public_trees %>%
  ggplot(mapping=mapping)+
  geom_point(aes(text = paste("Woodland Area:", woodland, "(K ha)",
                               "<br>Year:", year, "<br>Type & Region:", tree_type), shape=tree_type))+
  labs(title="The Growth of Public Woodland Area in the United Kingdom from 1998 to 2024",
       x="Year (ending March 31st)",
       y="Woodland area (in thousand hectares)",
       colour="Woodland Type and Region",
       shape=NULL)+
  scale_colour_manual(values=c(`FE conifers, England`="#FF0000",`FE broadleaves, England`="#FF0000",
                               `NRW conifers, Wales`="#00CD00",`NRW broadleaves, Wales`="#00CD00",
                               `FLS conifers, Scotland`="#0000CD",`FLS broadleaves, Scotland`="#0000CD",
                               `FS conifers, NI`="#FFA900",`FS broadleaves, NI`="#FFA900"))+
  scale_shape_manual(values=c(0,15,1,16,2,17,5,18))+
  scale_x_continuous(breaks=seq(1998,2024,2))+
  scale_y_continuous(breaks=seq(0,1000,50))+
  theme(panel.border=element_rect(colour="#8B7355",fill=NA,linewidth=2),
        panel.grid.major.x=element_line(colour="#8B7355",linewidth=0.2),
        panel.grid.major.y=element_line(colour="#CAFF70",linewidth=0.2),
        axis.line=element_line(linewidth=4,colour="#CAFF70"),
        panel.background=element_rect(fill="white"),
        plot.title=element_text(face="italic"),
        text=element_text(family="sans",size=12))

# Unfortunately, the logo cannot be added to an interactive plot as this feature has not yet been implemented into plotly

# As proper preparations and modifications have already been made, the plot can now instantly be made interactive

InteractivePlot_public<-ggplotly(InteractivePlot_public, tooltip="text",width=1000,height=600) %>%
  layout(margin = list(t = 50, r = 50, b = 50, l = 50))

## Visualisation of the Growth of Coniferous and Deciduous Public Woodland Area within the United Kingdom, from 1998 to 2024

InteractivePlot_public



Private Sector Woodland

# DATA EXTRACTION: Woodland Tree Types on PRIVATE land ...

private_trees<-data.frame(
  tree_type=c(rep("Private sector conifers, England",27),rep("Private sector broadleaves, England",27),rep("Private sector conifers, Wales",27),rep("Private sector broadleaves, Wales",27),
              rep("Private sector conifers, Scotland",27),rep("Private sector broadleaves, Scotland",27),rep("Private sector conifers, NI",27),rep("Private sector broadleaves, NI",27)),
  woodland=c(processed_extracted_data$`Private sector conifers (thousand ha)`,processed_extracted_data$`Private sector broadleaves (thousand ha)`,
             processed_extracted_data$`Private sector conifers (thousand ha).1`,processed_extracted_data$`Private sector broadleaves (thousand ha).1`,
             processed_extracted_data$`Private sector conifers (thousand ha).2`,processed_extracted_data$`Private sector broadleaves (thousand ha).2`,
             processed_extracted_data$`Private sector conifers (thousand ha).3`,processed_extracted_data$`Private sector broadleaves (thousand ha).3`),
  year=c(`year_ending_March_31st`)) # creates a dataframe amenable to the upcoming visualisation

private_trees$tree_type<-factor(private_trees$tree_type, levels=c("Private sector conifers, England", "Private sector broadleaves, England", 
                                                                  "Private sector conifers, Wales", "Private sector broadleaves, Wales",
                                                                  "Private sector conifers, Scotland", "Private sector broadleaves, Scotland",
                                                                  "Private sector conifers, NI", "Private sector broadleaves, NI")) # facilitates consistent legend order

## BASIC INTERACTIVE PLOT 2:

# Generates an interactive scattergraph of documented PRIVATE coniferious and deciduous woodland area within the United Kingdom from 1998-2024

InteractivePlot_private<-private_trees %>%
  ggplot(mapping=mapping)+
  geom_point(aes(text = paste("Woodland Area:", woodland, "(K ha)",
                              "<br>Year:", year, "<br>Type & Region:", tree_type), shape=tree_type))+
  labs(title="The Growth of Private Woodland Area in the United Kingdom from 1998 to 2024",
       x="Year (ending March 31st)",
       y="Woodland area (in thousand hectares)",
       colour="Woodland Type and Region",
       shape=NULL)+
  scale_colour_manual(values=c(`Private sector conifers, England`="#FF0000",`Private sector broadleaves, England`="#FF0000",
                               `Private sector conifers, Wales`="#00CD00",`Private sector broadleaves, Wales`="#00CD00",
                               `Private sector conifers, Scotland`="#0000CD",`Private sector broadleaves, Scotland`="#0000CD",
                               `Private sector conifers, NI`="#FFA900",`Private sector broadleaves, NI`="#FFA900"))+
  scale_shape_manual(values=c(0,15,1,16,2,17,5,18))+
  scale_x_continuous(breaks=seq(1998,2024,2))+
  scale_y_continuous(breaks=seq(0,1000,100))+
  theme(panel.border=element_rect(colour="#8B7355",fill=NA,linewidth=2),
        panel.grid.major.x=element_line(colour="#8B7355",linewidth=0.2),
        panel.grid.major.y=element_line(colour="#CAFF70",linewidth=0.2),
        axis.line=element_line(linewidth=4,colour="#CAFF70"),
        panel.background=element_rect(fill="white"),
        plot.title=element_text(face="italic"),
        text=element_text(family="sans",size=12))

InteractivePlot_private<-ggplotly(InteractivePlot_private, tooltip="text",width=1000,height=600) %>%
  layout(margin = list(t = 50, r = 50, b = 50, l = 50))

## Visualisation of the Growth of Coniferous and Deciduous Private Woodland Area within the United Kingdom, from 1998 to 2024

InteractivePlot_private



Interactive Plot (using highcharter)

As a final tidbit, below I have created a simple interactive barplot (using highcharter) which summates all of the recorded private and public woodland statistics to show a more general trend in coniferous and deciduous woodland within the United Kingdom from 1998-2024. This could also be interpreted as a more visually advanced version of our original static plot, despite it actually being much simpler to create!

# DATA MANIPULATION: Woodland Tree Types on both PUBLIC and PRIVATE land ...

total_trees<-cbind(public_trees,private_trees) # combines all data into one dataframe
names(total_trees)<-make.names(names(total_trees), unique=TRUE) # renders names unique to allow renaming
total_trees<-total_trees%>% 
  rename(private_woodland=`woodland.1`,public_woodland=woodland)%>% # renames variables to facilitate interaction
  mutate(sources=paste(tree_type,`tree_type.1`, sep=" + "))%>% # creates variable to record source of woodland area
  select(-c(1,4,6))%>% # removes now unneccessary variables from the dataframe
  select(c("sources", "year", "public_woodland", "private_woodland")) # reorders dataframe to improve readability
total_trees$total_woodland<-rowSums(total_trees[, c("public_woodland","private_woodland")], na.rm=TRUE) # calculates total woodland areas

visual_theme<-hc_theme( # creates an aesthetically similar "woodland"-esque theme to the prior visualisations
  colors=c("#FF0000","darkred","#00CD00","darkgreen","#0000CD","darkblue","#FFA900","darkorange"),
  chart=list(plotBackgroundColor="beige",
             style=list(fontFamily="Helvetica Neue", fontSize="15px")),
  title=list(style=list(fontWeight="bold",fontSize="20px")),
  yAxis=list(gridLineColor="#8B7355"))

# BASIC INTERACTIVE PLOT 3:

# Generates an interactive stacked barplot using the 'highcharter' rather than 'plotly' package, showing totals of coniferious and deciduous woodland area within the United Kingdom from 1998-2024

amalgamated_visualisation<-total_trees %>% 
  hchart('column', hcaes(x=year,y=total_woodland,group=sources),stacking="normal")%>%
  hc_title(text="Documented Woodland Area of the UK from 1998-2024")%>%
  hc_subtitle(text="Segmented by tree type (coniferous or deciduous) and country")%>%
  hc_xAxis(title=list(text="Year (ending March 31st)"))%>%
  hc_yAxis(title=list(text="Woodland area (in thousand hectares)"))%>%
  hc_add_theme(visual_theme)%>%
  hc_caption(text="Northern Ireland only began documenting woodland area types as of 2005: see abline")%>%
  hc_annotations(list(shapes=list(list(type = 'path',points = list(list(xAxis = 0, yAxis = 0, x = 2004.5, y = 0),
                                                                   list(xAxis = 0, yAxis = 0, x = 2004.5, y = 4000)),
                                       stroke="black",strokeWidth=1)),draggable=""))

# Trend in RECORDED woodland area (i.e. as 1998-2004 NI woodland area was not documented) in the United Kingdom from 1998-2024.

amalgamated_visualisation



Notes

As an ode to further exploration, it must again be clarified that this project has been purely for the purpose of data visualisation - not analysis. Thusly, no definitive conclusions can be made about whether any of the visual idiosyncrasies occuring across the figures hold any inherent value to forming conclusions about woodland development. However, though they cannot determine what or how a significant change in woodland may occur, they are exceptionally useful for investigating where these differences may exist.

In tandem, further visualisation opportunities lie in compiling data available within the other ODS files available from the fascias of the Forestry Commission - i.e. such as how the amount of sawmills and wood production has changed from 1998-2024 - to investigate questions such as whether the rate of woodland development is sustainable to the rate of wood production by sawmills in the United Kingdom.

Beyond the constraints of the available data, taking into account the actual geographic size of each country could also be highly insightful for mapping observed changes in relation to total land available, and the percentage of each country’s landmass which is recorded as woodland area.